Joint Bayesian Morphology learning for Dravidian languages

نویسندگان

  • Arun Kumar
  • Lluís Padró
  • Antoni Oliver
چکیده

In this paper a methodology for learning the complex agglutinative morphology of some Indian languages using Adaptor Grammars and morphology rules is presented. Adaptor grammars are a compositional Bayesian framework for grammatical inference, where we define a morphological grammar for agglutinative languages and morphological boundaries are inferred from a plain text corpus. Once morphological segmentations are produce, regular expressions for sandhi rules and orthography are applied to achieve the final segmentation. We test our algorithm in the case of two complex languages from the Dravidian family. The same morphological model and results are evaluated comparing to other state-of-the art unsupervised morphology learning systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Morphology Based POS Tagging on Telugu

In this paper, we present a morphological based automatic tagging for Telugu without requiring any machine learning algorithm or training data. We believe that inflectional and agglutinating languages, the critical information required for tagging comes more from word internal structure than from the context and we show how a well designed morphological analyzer can assign correct tags and disa...

متن کامل

Unity in Diversity: A Unified Parsing Strategy for Major Indian Languages

This paper presents our work to apply non linear neural network for parsing five r esource p oor I ndian L anguages belonging to two major language families Indo-Aryan and Dravidian. Bengali and Marathi are Indo-Aryan languages whereas Kannada, Telugu and Malayalam belong to the Dravidian family. While little work has been done previously on Bengali and Telugu linear transition-based parsing, w...

متن کامل

Machine Translation-Indian Regional Languages

Natural Language Processing is an emerging field of Machine Learning. NLP systems deal with making use of machines to translate text or speech. MT system can be classified according to approaches being followed for translation. In this paper, existing MT systems according to the regional languages of India are being analyzed. Key-Words: Machine Translation (MT), Natural Language Processing (NLP...

متن کامل

Significance of an Accurate Sandhi-Splitter in Shallow Parsing of Dravidian Languages

This paper evaluates the challenges involved in shallow parsing of Dravidian languages which are highly agglutinative and morphologically rich. Text processing tasks in these languages are not trivial because multiple words concatenate to form a single string with morpho-phonemic changes at the point of concatenation. This phenomenon known as Sandhi, in turn complicates the individual word iden...

متن کامل

Statistical Sandhi Splitter for Agglutinative Languages

Sandhi splitting is a primary and an important step for any natural language processing (NLP) application for languages which have agglutinative morphology. This paper presents a statistical approach to build a sandhi splitter for agglutinative languages. The input to the model is a valid string in the language and the output is a split of that string into meaningful word/s. The approach adopte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015